Tight Lower Bounds for Query Processing on Streaming and External Memory Data

نویسندگان

  • Martin Grohe
  • Christoph Koch
  • Nicole Schweikardt
چکیده

It is generally assumed that databases have to reside in external, inexpensive storage because of their sheer size. Current technology for external storage systems presents us with a reality that performance-wise, a small number of sequential scans of the data is strictly preferable over random data accesses. Database technology — in particular query processing technology — has developed around a notion of memory hierarchies with layers of greatly varying sizes and access times. It seems that the current technologies scale up to their tasks and are very successful, but on closer investigation it may appear that our theoretical understanding of the problems involved — and of optimal algorithms for these problems — is not quite as developed. Recently, data stream processing has become an object of study by the database management community, but from the viewpoint of database theory, this is really a special case of the query processing problem on data in external storage where we are limited to a single scan of the input data. In the present paper we study a clean machine model for external memory and stream processing. We establish tight bounds for the data complexity of Core XPath evaluation and filtering. We show that the number of scans of the external data induces a strict hierarchy (as long as internal memory space is sufficiently small, e.g., polylogarithmic in the size of the input). We also show that neither joins nor sorting are feasible if the product of the number r(n) of scans of the external memory and the size s(n) of the internal memory buffers is sufficiently small, i.e., of size o(n). Preprint submitted to Elsevier Science 6 November 2006

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

The Complexity of Querying External Memory and Streaming Data

We review a recently introduced computation model for streaming and external memory data. An important feature of this model is that it distinguishes between sequentially reading (streaming) data from external memory (through main memory) and randomly accessing external memory data at specific memory locations; it is well-known that the latter is much more expensive in practice. We explain how ...

متن کامل

Input/Output Streaming Complexity of Reversal and Sorting

This work revisits the study of streaming algorithms where both input and output are data streams. While streaming algorithms with multiple streams have been studied before, such as in the context of sorting, most assumed very nonrestrictive models and thus had weak lower bounds. We consider data streams with restricted access, such as read-only and write-only streams, as opposed to read-write ...

متن کامل

Earliest Query Answering for Deterministic Streaming Tree Automata and a Fragment of XPath

We study the concept of earliest query answering as neededfor streaming XML processing with optimal memory man-agement. We derive lower complexity bounds showing thatearliest query answering for Forward XPath is not feasible inpolynomial time combined complexity except if P=NP. Wethen distinguish a fragment of Forward XPath with negationthat enjoys P-time earliest query ...

متن کامل

Design and Test of the Real-time Text mining dashboard for Twitter

One of today's major research trends in the field of information systems is the discovery of implicit knowledge hidden in dataset that is currently being produced at high speed, large volumes and with a wide variety of formats. Data with such features is called big data. Extracting, processing, and visualizing the huge amount of data, today has become one of the concerns of data science scholar...

متن کامل

Asymmetric Communication Complexity and Data Structure Lower Bounds

You can think of the cell probe model as having a CPU and some random access memory (RAM). The state of your data structure is encoded in the memory cells of the RAM and the CPU must answer some query by accessing certain memory cells. For the rest of the lecture s will denote the number of memory cells and w will denote the number of bits in each memory cell. The only thing you are charged for...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Theor. Comput. Sci.

دوره 380  شماره 

صفحات  -

تاریخ انتشار 2005